Add fast implementation of LARS #16122

Caenorst · 2019-09-08T18:20:20Z

Description

Add a Fast implementation of LARS.

Checklist

Essentials

Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Add multi_sum_sq Op: computing sum of squares of multiple arrays (1 sum per array).
Add multi_lars Op: computing LARS with the sum of squares of weights and gradients.
Add preloaded_multi_sgd which a version of SGD where learning rate are MXNet array instead of list of parameters.
Add SGDwFastLARS optimizer
Add test for the new Ops and optimizer.

Comments

This code have been used for MLPerf v0.6 benchmarks. It's especially pertinent if you are training with small local batch size as optimizers usually don't scale with batch size.

Credits

The CPU part is from @drivanov
As said in the comments of multi_sum_sq files, the main approach is coming from https://github.com/NVIDIA/apex

… test Conflicts: tests/python/gpu/test_operator_gpu.py

anirudhacharya · 2019-09-08T21:56:28Z

python/mxnet/optimizer/optimizer.py

@@ -781,6 +784,240 @@ def update(self, index, weight, grad, state):
        ftml_update(weight, grad, prev_d, prev_v, prev_z, out=weight,
                    lr=lr, wd=wd, **kwargs)

+@register
+class SGDwFastLARS(Optimizer):


is this different from LBSGD defined here - https://github.com/apache/incubator-mxnet/pull/16122/files#diff-0c893416e9e93fbd94dfaa9fa6c13d67R1022

Currently LBSGD implementation is buggy, maybe this should replace that, rather than create SGDwFastLARS?

Thank you for quick review.

LB stand for "Large batch" indicating that it will use diverse techniques. As warmup could be implemented directly with lr_scheduler (LARS contrarily to what LBSGD indicate is not a warmup strategy but an optimizer). I think the only reason to use such name would be to have the choice between LARS and LARC optimizers (which will be coming in another PR), or even other techniques for large batch.

I think it could make sense as both implementation are very similar although I'm not sure it's ergonomic to have the selection of optimizer as an argument of Optimizer.

yes, i am not suggesting we have a parameter that will switch the optimizer from LARS to LARC. Maybe we could just deprecate LBSGD( or mark it for deprecation), since it does not work.

I think deprecation is the best option, please reviewers thumb up / down this comment for opinion

I am in favorof deprecation of LBSGD, but still I would prefer a better name - "Fast" in "SGDwFastLARS" does not really make sense (fast compared to what?). Maybe just LARSSGD or something like that?

We could also just call them LARC or LARS.

As a point of reference, TF calls it LARSOptimizer:
https://www.tensorflow.org/api_docs/python/tf/contrib/opt/LARSOptimizer

I renamed it LARS, and also remove the "lars" redundant prefix from eta and eps let me know if it's fine like that

anirudhacharya · 2019-09-09T17:40:00Z

python/mxnet/optimizer/optimizer.py

@@ -781,6 +784,240 @@ def update(self, index, weight, grad, state):
        ftml_update(weight, grad, prev_d, prev_v, prev_z, out=weight,
                    lr=lr, wd=wd, **kwargs)

+@register
+class SGDwFastLARS(Optimizer):
+    def __init__(self, momentum=0.0, lazy_update=True, lars_eta=0.001, lars_eps=0,


could you add documentation like here - https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/optimizer/optimizer.py#L727

Is this good enough ?

… redundancy of 'lars' in the parameters

Caenorst · 2019-09-09T20:31:53Z

SyntaxError: only named arguments may follow *expression

I'm not seeing this error on my local machine, what version of Python is on CI ?

anirudhacharya · 2019-09-09T20:37:04Z

tests/python/gpu/test_optimizer.py

+    return lenet
+
+@with_seed()
+def test_lars():


how long does this test take? should it be put in nightly tests instead of unit tests?

should we add a test in python/unittest/test_optimizer.py?

It's taking 10s with my V100

since we are actually training a network here, we should move this to nightly tests and have an optimizer test in python/unittest/test_optimizer.py by building a mock optimizer like we have done for other optimizers( ref - https://github.com/apache/incubator-mxnet/blob/master/tests/python/unittest/test_optimizer.py#L514).

So, I'm already testing the MXNet Ops in https://github.com/apache/incubator-mxnet/pull/16122/files#diff-4758fb9329d438de2836db2634a8f5f7R270-R422. which are the only non-python part of the optimizer, what my test is adding is to show that it's properly behaving: i.e allow you to train a network with a bigger batch. Making a fully python of it wouldn't help me testing that (the python could be wrong as well, let's say if I misunderstood the publication)

@anirudhacharya Does swapping my training test to nightly test and just unit testing my MXNet Ops in /tests/python/gpu/test_operator_gpu.py enough ? If yes I believe I'm all done for this PR :)

@Caenorst sorry, missed seeing this. i will take another look. And, yes it would be good to move time consuming tests to nightly tests.

I understand that the python mock optimizer could be wrong if our understanding of the paper is incorrect. Let us try to catch them in the code review process.

apart from the above mentioned test issue, everything else lgtm.

Caenorst · 2019-09-20T16:41:36Z

I don't understand the errors in the CI, can somebody help me ?

apeforest

LGTM
@Caenorst Thanks a lot for your contribution!

eric-haibin-lin · 2019-10-17T22:07:44Z

python/mxnet/optimizer/optimizer.py

+        v = v.astype('float32')
+        if rescale:
+            v *= self.rescale_grad
+        norm = NDnorm(v).asnumpy()[0]


Is this intended? I thought having a blockingc call is bad

* add MXNet operator for fast LARS * add unit tests for fast LARS related MXNet Ops * fix preloaded_multi_* dtype inference, add SGDwFastLARS optimizer and test Conflicts: tests/python/gpu/test_operator_gpu.py * remove commented out cast from lenet5 model * fix lint * Add more documentation, change name of SGDwFastLARS by LARS, removing redundancy of 'lars' in the parameters * change optimizer code to be python2 retro-compatible * fix lint * replace push_back by emplace_back for cland-tidy

Caenorst added 4 commits August 18, 2019 11:04

add MXNet operator for fast LARS

9290389

add unit tests for fast LARS related MXNet Ops

3e2a5fa

fix preloaded_multi_* dtype inference, add SGDwFastLARS optimizer and…

929942d

… test Conflicts: tests/python/gpu/test_operator_gpu.py

remove commented out cast from lenet5 model

6933b7c

Caenorst requested a review from eric-haibin-lin as a code owner September 8, 2019 18:20

fix lint

7a44531

anirudhacharya reviewed Sep 8, 2019

View reviewed changes

anirudhacharya reviewed Sep 9, 2019

View reviewed changes

samskalicky mentioned this pull request Sep 9, 2019

Fix for duplicate subgraph inputs/outputs #16131

Merged

Add more documentation, change name of SGDwFastLARS by LARS, removing…

f7c90ab

… redundancy of 'lars' in the parameters

anirudhacharya reviewed Sep 9, 2019

View reviewed changes

apeforest self-assigned this Sep 13, 2019

change optimizer code to be python2 retro-compatible

0b7ce72

ptrendx mentioned this pull request Sep 17, 2019

[Discussion] 1.6.0 Roadmap #15589

Closed

fix lint

92be7ba

replace push_back by emplace_back for cland-tidy

ea91ce0

apeforest approved these changes Sep 25, 2019

View reviewed changes

anirudhacharya approved these changes Sep 26, 2019

View reviewed changes

apeforest merged commit 810e67c into apache:master Sep 30, 2019

szha mentioned this pull request Oct 1, 2019

Flaky test: test_preloaded_multi_sgd #16345

Open

Caenorst mentioned this pull request Oct 2, 2019

Fix atol for test_preloaded_multi_sgd #16356

Merged

3 tasks

eric-haibin-lin reviewed Oct 17, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fast implementation of LARS #16122

Add fast implementation of LARS #16122

Caenorst commented Sep 8, 2019

anirudhacharya Sep 8, 2019 •

edited

Loading

Caenorst Sep 9, 2019

anirudhacharya Sep 9, 2019 •

edited

Loading

Caenorst Sep 9, 2019

ptrendx Sep 9, 2019

Caenorst Sep 9, 2019

Caenorst Sep 9, 2019

anirudhacharya Sep 9, 2019

Caenorst Sep 9, 2019

Caenorst Sep 9, 2019

Caenorst commented Sep 9, 2019

anirudhacharya Sep 9, 2019 •

edited

Loading

Caenorst Sep 10, 2019

anirudhacharya Sep 18, 2019

Caenorst Sep 20, 2019

Caenorst Sep 23, 2019

anirudhacharya Sep 25, 2019 •

edited

Loading

anirudhacharya Sep 26, 2019

Caenorst commented Sep 20, 2019

apeforest left a comment

eric-haibin-lin Oct 17, 2019

Add fast implementation of LARS #16122

Add fast implementation of LARS #16122

Conversation

Caenorst commented Sep 8, 2019

Description

Checklist

Essentials

Changes

Comments

Credits

anirudhacharya Sep 8, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anirudhacharya Sep 9, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Caenorst commented Sep 9, 2019

anirudhacharya Sep 9, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anirudhacharya Sep 25, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Caenorst commented Sep 20, 2019

apeforest left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anirudhacharya Sep 8, 2019 •

edited

Loading

anirudhacharya Sep 9, 2019 •

edited

Loading

anirudhacharya Sep 9, 2019 •

edited

Loading

anirudhacharya Sep 25, 2019 •

edited

Loading